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Abstract 

Traditional fact checking by expert journalists cannot keep up with the enormous volume of information 
that is now generated online. Computational fact checking may significantly enhance our ability to evaluate 
the veracity of dubious information. Here we show that the complexities of human fact checking can be 
approximated quite well by finding the shortest path between concept nodes under properly defined 
semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible 
with efficient computational techniques. We evaluate this approach by examining tens of thousands of 
claims related to history, entertainment, geography, and biographical information using a public knowledge 
graph extracted from Wikipedia. Statements independently known to be true consistently receive higher 
support via our method than do false ones. These findings represent a significant step toward scalable 
computational fact-checking methods that may one day mitigate the spread of harmful misinformation. 


Introduction 

Online communication platforms, in particular social media, have created a situation in which the proverbial lie 
“can travel the world before the truth can get its boots on.” Misinformation |T|, astroturf ||2l, spam ||3l, and 
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outright fraud [4i| have become widespread. They are now seemingly unavoidable components of our online 
information ecology fjl] that jeopardize our ability as a society to make rapid and informed decisions 

miTiiiiiiiiia. 

While attempts to partially automate the detection of various forms of misinformation are burgeoning 

automated reasoning methods are hampered by the inherent ambiguity of language and by 
deliberate deception. However, under certain conditions, reliable knowledge transmission can take place online 
na. For example, Wikipedia, the crowd-sourced online encyclopedia, has been shown to be nearly as reliable 
as traditional encyclopedias, even though it covers many more topics ifTTIl . It now serves as a large-scale 
knowledge repository for millions of individuals, who can also contribute to its content in an open way. 
Vandalism, bias, distortions, and outright lies are frequently repaired in a matter of minutes |[T8]I . Its continuous 
editing process even indicates signs of collective human intelligence |[T9ll . 

Here we show that we can leverage any collection of factual human knowledge, such as Wikipedia, for 
automatic fact checking |[20l . Loosely inspired by the principle of epistemic closure GTl . we computationally 
gauge the support for statements by mining the connectivity patterns on a knowledge graph. Our initial focus is 
on computing the support of simple statements of fact using a large-scale knowledge graph obtained from 
Wikipedia. 

Knowledge Graphs 

Let a statement of fact be represented by a subject-predicate-object triple, e.g., (“Socrates,” “is a,” “person”). A 
set of such triples can be combined to produce a knowledge graph (KG), where nodes denote entities (i.e., 
subjects or objects of statements), and edges denote predicates. Given a set of statements that has been 
extracted from a knowledge repository — such as the aforementioned Wikipedia — the resulting KG network 
represents all factual relations among entities mentioned in those statements. Given a new statement, we expect 
it to be true if it exists as an edge of the KG, or if there is a short path linking its subject to its object within the 
KG. If, however, the statement is untrue, there should be neither edges nor short paths that connect subject and 
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object. 

In a KG distinct paths between the same subject and object typically provide different factual support for the 
statement those nodes represent, even if the paths contain the same number of intermediate nodes. For 
example, paths that contain generic entities, such as “United States” or “Male,” provide weaker support 
because these nodes link to many entities and thus yield little specific information. Conversely, paths 
comprised of very specific entities, such as “positronic flux capacitor” or “terminal deoxynucleotidyl 
transferase,” provide stronger support. A fundamental insight that underpins our approach is that the definition 
of path length used for fact checking should account for such information-theoretic considerations. 

To test our method we use the DBpedia database Il22ll . which consists of all factual statements extracted from 
Wikipedia’s “infoboxes” (see Fig.[T|a)). From this data we build the large-scale Wikipedia Knowledge Graph 
(WKG), with 3 million entity nodes linked by approximately 23 million edges (see Materials and Methods). 
Since we use only facts within infoboxes, the WKG contains the most uncontroversial information available on 
Wikipedia. This conservative approach is employed to ensure that our process relies as much as possible on a 
human-annotated, collectively-vetted factual basis. The WKG could be augmented with automatic methods to 
infer facts from text and other unstructured sources available online. Indeed, other teams have proposed 
methods to infer knowledge from text ||23]| to be employed in large and sophisticated rule-based inference 
models Il2^l25ll2^ . Here we focus on the feasibility of automatic fact checking using simple network models 
that leverage DBpedia. For this initial goal, we do not need to enhance the WKG, but such improvements can 
later be incorporated. 

Semantic Proximity from Transitive Closure 

Let the WKG be an undirected graph G = {V,E) where U is a set of concept nodes and is a set of predicate 
edges (see Materials and Methods). Two nodes n, m G U are said to be adjacent if there is an edge between 
them {v, w) G E. They are said to be connected if there a sequence of n > 2 nodes v = vi,V 2 ,.. - Vn = w, 
such that, for i = 1,..., n — 1 the nodes Vi and Uj+i are adjacent. The transitive closure of G is G* = (U, E*) 
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where the set of edges is closed under adjacency, that is, two nodes are adjacent in G* ijf they are connected in 
G via at least one path. This standard notion of closure has been extended to weighted graphs, allowing 
adjacency to be generalized by measures of path length ||23, such as the semantic proximity for the WKG we 
introduce next. 

The truth value r(e) G [0,1 ] of a new statement e = (s,p, o) is derived from a transitive closure of the WKG. 
More specifically, the truth value is obtained via a path evaluation function: r(e) = max >V(Ps,o)- This 
function maps the set of possible paths connecting s and o to a truth value r. A path has the form 
Ps,o = viV 2 ■ ■ ■ Vn, where Vi is an entity node, {vi, Vi+i) is a edge, n is the path length measured by the number 
of its constituent nodes, vi = s, and Vn = o. Various characteristics of a path can be taken as evidence in 
support of the truth value of e. Here we use the generality of the entities along a path as a measure of its length, 
which is in turn aggregated to define a semantic proximity. 


W(P,,o) = W(ni ...Vn) 


n—1 


-1 


1 + '^logk{vi) 
i=2 


( 1 ) 


where k{v) is fhe degree of enfify v, i.e., fhe number of WKG sfafemenfs in which if parficipafes; if therefore 
measures the generality of an entity. If e is already present in the WKG (i.e., there is an edge between s and o), 
it should obviously be assigned maximum truth. In fact W = 1 when n = 2 because there are no intermediate 
nodes. Otherwise an indirect path of length n > 2 may be found via other nodes. The truth value T(e) 
maximizes the semantic proximity defined by Eq. [T] which is equivalenf fo finding fhe shorfesf pafh befween s 
and o |[26l , or the one that provides the maximum information content ETl [28]| in the WKG. The transitive 
closure of weighted graphs equivalent to finding fhe shorfesf pafhs befween every pair of nodes is also known 
as fhe metric closure ll26l . Fig.[TJb) depicfs an example of a shorfesf path on the WKG for a statement that 
yields a low truth value. 

Note that in this specific formulation we disregard fhe semantics of fhe predicate, therefore we are only able to 
check statements with the simplest predicates, such as “is a”; negation, for instance, would require a more 
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sophisticated definition of path length. 

Alternative definitions of r(e) are of course possible. Instead of shortest paths, one could use a different 
optimization principle, such as widest bottleneck, also known as the ultra-metric closure 1261, which 
corresponds to maximizing the path evaluation function Wu'- 


=Wu{vi...Vn) = 


= 


n = 2 


[l + max"^ 2 ^ {log k (fj)}] ^ n > 2. 


( 2 ) 


Or it could be possible to retain the original directionality of edges and have a directed WKG instead of an 
undirected one. As described next, we evaluated alternative definitions of T(e) and found Eq.[^to perform best. 


Results 

Calibration 

Our fact-checking method requires that we define a measure of path semantic proximity by selecting a 
transitive closure algorithm (the shortest paths of Eq. [T]or the widest bottleneck paths of Eq. and a directed 
or undirected WKG representation. To evaluate these four combinations empirically, let us attempt to infer the 
party affiliation of US Congress members. In other words, we want to compute the support of statements like 
“x is a member of y” where x is a member of Congress and y is a political party. We consider all members of 
the 112th US Congress that are affiliated with either the Democratic or Republican party (Senate: N = 100; 
House: N = 445). We characterize each member of Congress with its semantic proximity to all nodes in the 
WKG that represent ideologies. This yields an N x M feature matrix J^tc for each of the four transitive closure 
methods. The top panel of Eig. [^illustrates the proximity network obtained from IFtc that connects members of 
the 112th Congress and their closest ideologies, as computed using Eq.[T] A high degree of ideological 
polarization can be observed in the WKG, consistent with blogs ll29l and social media ||30l. 
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We feed Ftc into off-the-shelf classifiers (see Materials and Methods). As shown in Table [T] the metric closure 
on the undirected graph gives the most accurate results. Therefore, we continue to use this combination in our 
semantic proximity computations when performing the validation tasks described below. 

To evaluate the overall performance of the calibrated model, we also compared it against DW-NOMINATE, the 
state of the art in political classification ll^ . This model is not based on data from a knowledge graph, but on 
explicit information about roll-call voting patterns. Comparing our classification results with such a baseline is 
also useful to gauge the quality of the latent information contained in the WKG for the task of political 
classification. As shown in the bottom panel of Fig. a Random Forests classifier frained on our frufh values 
mafches fhe performance of DW-NOMINATE. 

Graph search vs infohoxes only 

Mosf of fhe WKG informalion fhaf our facf checker exploifs is provided by indirecf pafhs (i.e., comprising 
n > 2 nodes). To demonstrate this, we compare the calibrated model of Eq.[T]to the fact checker’s performance 
with only the information in the infohoxes. 

In practice, we compute an additional feature matrix F\^, using the same sequence of steps outlined in the 
calibration phase, but additionally constraining the shortest path algorithm to use only paths (if any) with 
exactly n = 2 nodes, i.e., direct edges. Thus F\y encodes only the information of the infohoxes of the 
politicians. The results from 10-fold cross validation using Ftc and F\^ are shown in Table|^ The same 
off-the-shelf classifiers, fhis fime frained on Fh, perform only slighfly heifer lhan random, fhus confirming fhaf 
fhe frufh signal is yielded by fhe sfrucfure of indirecf connections in fhe WKG. 

Validation on factual statements 

We lest our fact-checking method on tasks of increasing difficulty, and begin by considering simple factual 
statements in four subject areas related to entertainment, history, and geography. We evaluate statements of the 
form “dj directed rrij,” “pi was married to Sj” and “cj is the capital of r^,” where di is a director, nij is a 
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movie, pi is a US president, sj is the spouse of a US president, Cj is a city, and rj is a country or US state. By 
considering all combinations of subjects and objects in these classes, we obtain matrices of statements (see 
Materials and Methods). Many of them, such as “Rome is the capital of India,” are false. Others, such as 
“Rome is the capital of Italy,” are true. To prevent the task from being trivially easy, we remove any edges that 
represent true statements in our test set from the graph. Fig. shows the matrices obtained by running the fact 
checker on the factual statements. Let e and e' be a true and false statement, respectively, from any of the four 
subject areas. To show that our fact checker is able to correctly discriminate between true and false statements 
with high accuracy, we estimate the probability that r (e) > r (e'). To do so we plot the ROC curve of the 
classifier (see Fig.|^ since the area under the ROC curve is equivalent to this probability |[32l . With this 
method we estimate that, in the four subject areas, true statements are assigned higher truth values than false 
ones with probability 95%, 98%, 61%, and 95%, respectively. 

Validation on annotated corpus 

In a second task, we consider an independent corpus of novel statements extracted from the free text of 
Wikipedia and annotated as true or false by human raters ||3^ (see Materials and Methods). We compare the 
human ratings with the truth values provided by our automatic fact checker (Fig.[^. Although the statements 
under examination originate from Wikipedia, they are not usually represented in the WKG, which is derived 
from the infoboxes only. When a statement is present in the WKG, the link is removed. The information 
available in the WKG about the entities involved in these particular statements is very sparse, therefore this task 
is more difficult than the previous case. 

We find fhaf fhe frufh values compufed by fhe facl checker are positively correlafed fo fhe average rafings given 
by fhe human evaluators. Tableshows fhe posifive correlafion befween GREC human annofafions and our 
compufafional frufh scores. 

As shown in Fig.|^ our facf checker yields consisfenfly higher support for frue sfafemenfs fhan false ones. 
Using only information in fhe infoboxes however yields worse performance, closer fo random choice: 
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AUROC = 0.47 and 0.52 for the ‘degree’ and ‘institution’ predicates, respectively. We conclude that the fact 
checker is able to integrate the strength of indirect paths in the WKG, which pertain to factual information not 
originally included in the infoboxes. 

Discussion 

These results are both encouraging and exciting: a simple shortest path computation maximizing information 
content can leverage an existing body of collective human knowledge to assess the truth of new statements. In 
other words, the important and complex human task of fact checking can be effectively reduced to a simple 
network analysis problem, which is easy to solve computationally. Our approach exploits implicit information 
from the topology of the WKG, which is different from the statements explicitly contained in the infoboxes. 
Indeed, if we base our assessment only on direct edges in the WKG, performance decreases significantly. This 
demonstrates that much of the correct measurement of the truthfulness of statements relies on indirect paths. 
Because there are many ways to compute shortest paths in distance graphs, or transitive closures in weighted 
graphs Il26ll . there is ample room for improvement on this method. 

We live in an age of overabundant and ever-growing information, but much of it is of questionable veracity 
mniiMi- Establishing the reliability of information in such circumstances is a daunting but critical challenge. 
Our results show that network analytics methods, in conjunction with large-scale knowledge repositories, offer 
an exciting new opportunity towards automatic fact-checking methods. As the importance of the Internet in our 
everyday lives grows, misinformation such as panic-inducing rumors, urban legends, and conspiracy theories 
can efficiently spread online in variety of new ways l|8l|5l. Scalable computational methods, such as the one we 
demonstrate here, may hold the key to mitigate the societal effects of these novel forms of misinformation. 


Materials and Methods 


Wikipedia Knowledge Graph 

To obtain the WKG we downloaded and parsed RDF triples data from the DBpedia project (dbpedia . org). 
We used three datasets of triples to build the WKG: the “Types” dataset, which contains subsumption triples of 
the form (subject, “is-a,” Class), where Class is a category of the DBpedia ontology; the “Properties” dataset, 
which contains triples extracted from infoboxes; and the DBpedia ontology, from which we used all triples with 
predicate “subClassOf.” This last data was used to reconstruct the full ontological hierarchy of the graph. We 
then discarded the predicate part of each triple and conflated all triples having the same subject and object, 
obtaining an edge list. In this process, we discarded all triples whose subject or object belonged to external 
namespaces (e.g., FOAF and schema . org). We also discarded all triples from the “Properties” dataset whose 
object was a date or any other kind of measurement (e.g., “Aristotle,” “birthYear,” “384 B.C.”), because by 
definition they never appear as subjects in other triples. 

Ideological classification of the US Congress 

To get a list of ideologies we consider the “Ideology” category in the DBpedia ontology and look up in the 
WKG all nodes V connected to it by means of a statement (Y, “is-a,” “Ideology”). We found M = 819 such 
nodes (see SI appendix for a complete list of the ideologies). Given a politician X and an ideology Y we then 
compute the truth value of the statement “X endorses ideology Y.” To perform the classification, we use two 
standard classifier algorithms: /c-Nearest Neighbors Il35ll and Random Forests Ohll . To assess the classification 
accuracy we computed F-score and area under Receiver Operating Characteristic (ROC) curve using 10-fold 
cross-validation. 
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Simple factual statements 


We formed simple statements by combining each of N subject entities with each of N object entities. We 
performed this procedure in four subject areas: (1) Academy Awards for Best Movie (N = 59), (2) US 
presidential couples (N = 17), (3) US state capitals (N = 48), and (4) world capitals (N = 187). For directors 
with more than one award, only the first award was used. All data were taken from Wikipedia (see S/ 
appendix). To make the test fair, if a triple indicating a true statement was already present in the WKG, we 
removed it from the graph before computing the truth value. This step of the evaluation procedure is typical of 
link prediction algorithms Il37]l . 

Independent corpus of statements 

The second ground truth dataset is based on the Google Relation Extraction Corpus (GREG) ll3^ . Eor 
simplicity we focus on two types of statements, about education degrees (N = 602) and institutional 
affiliations {N = 10, 726) of people, respectively. Each triple in the GREG comes with truth ratings by five 
human raters (Eigure|^c)), so we map the ratings into an ordinal scale between —5 (all raters replied ‘No’) and 
+5 (all raters replied ‘Yes’), and compare them to the truth values computed by the fact checker. The subject 
entities of several triples in the GREG appear in only a handful of links in the WKG, limiting the chances that 
our method can find more than one path. Therefore we select from the two datasets only triples having a 
subject with degree k > 3. Similarly to the previous task, if the statement is already present in the WKG, we 
remove the corresponding triple before computing the truth value. 
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Table 1: Transitive closure calibration. Area under ROC eurve of two elassifiers, random forests (RF) and 
/c-nearest neighbors (/c-NN) on the ideologieal elassifieation task. 




Direeted 

Undireeted 



k-NN 

RF 

k-NN 

RF 

House 

Metrie 

96 

99 

97 

99 

Ultra-metrie 

56 

57 

53 

57 

Senate 

Metrie 

70 

100 

96 

100 

Ultra-metrie 

49 

39 

70 

61 


Table 2: Ideological classification results. Out-of-sample F-seore and Area Under Reeeiver Operating Char- 
aeteristie (AUROC) of random forest (RF) and fc-nearest neighbors (/c-NN) elassifiers trained on truth seores 
eomputed by the faet eheeker, using either the transitive elosure or solely information from infoboxes. 




RF 

/c-NN 

Dataset 

F-seore AUROC 

F-seore 

AUROC 

Transitive closure Ftc 

Senate 

0.99 

1.00 

0.91 

0.96 

House 

0.99 

1.00 

0.90 

0.97 

Infoboxes F \, 

Senate 

0.66 

0.46 

0.62 

0.54 

House 

0.54 

0.66 

0.68 

0.54 


Table 3: Agreement between fact checker and human raters. We use rank-order eorrelation eoeffieients 
(Kendall’s r and Spearman’s p) to assess whether the seores are eorrelated to the ratings. Signifieanee tests rule 
out the null hypothesis that the eorrelation eoeffieients are zero. 


Relation 

P 

p-value 

T 

p-value 

Degree 

0.17 

2 X 10-5 

0.13 

10 X 1-5 

Institution 

0.09 

4 X 10-^9 

0.07 

1 X 10-24 
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U.S. President Barack Obama in front of the 
Resolute desk in the Ovai Office of the White 


House. December 6.2012 
44th President of the United States 
litcumbent 
Assumed office 
January 20,2009 
Vice President Joe Biden 
Preceded by George W. Bush 

United States Sertator 
from lllirtois 
In office 

January 3.2005 - November 16,2008 
Preceded by Peter Fitzgerald 
Succeeded by Roland Burris 

Member of the lllirtois Senate 
from the i3th District 
In office 

January 8.1997 - November 4, 2004 
Preceded by Alice Palmer 
Succeeded by Kwame Raoul 
Personal details 

Born Barack Hussein Obama ll 

August4.1961 (age S2) 
Honolulu. Hawaii. U.S. 
Nationality American 

Political party Democratic 



Figure 1: Using Wikipedia to fact-check statements, (a) To populate the knowledge graph with facts we use 
structured information contained in the ‘infoboxes’ of Wikipedia articles (in the figure, the infobox of the article 
about Barack Obama), (b) Using the Wikipedia Knowledge Graph, computing the truth value of a subject- 
predicate-object statement amounts to finding a path between subject and object entities. In the diagram we plot 
the shortest path returned by our method for the statement “Barack Obama is a muslim.” The path traverses 
high-degree nodes representing generic entities, such as Canada, and is assigned a low truth value. 
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ylon-interventionism 


^ Right-libertarianism 
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Figure 2: Ideological classification of the US Congress based on truth values, (a) Ideological network of 
the 112th US Congress. The plot shows a subset of the WKG constituted by paths between Democratic or 
Republican members of the 112th US Congress and various ideologies. Red and blue nodes correspond to 
members of Congress, gray nodes to ideologies, and white nodes to vertices of any other type. The position of 
the nodes is computed using a force-directed layout lIMl . which minimizes the distance between nodes connected 
by an edge weighted by a higher truth value. For clarity only the most significant paths, whose values rank in 
the top 1% of truth values, are shown, (b) Ideological classification of members of the 112th US Congress. The 
plot shows on the x axis the party label probability given by a Random Forest classification model trained on 
the truth values computed on the WKG, and on the y axis the reference score provided by DW-NOMINATE. Red 
triangles are members of Congress affiliated to the Republican party and blue circles to the Democratic party. 
Histograms and density estimates of the two marginal distributions, color-coded by actual affiliation, are shown 
on the top and right axes. 
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Figure 3: Automatic truth assessments for simple factual statements. In each confusion matrix, rows rep¬ 
resent subjects and columns represent objects. The diagonals represent true statements. Higher truth values 
are mapped to colors of increasing intensity, (a) Films winning the Oscar for Best Movie and their directors, 
grouped by decade of award (see the complete list in the SI appendix), (b) US presidents and their spouses, 
denoted by initials, (c) US states and their capitals, grouped by US Census Bureau-designated regions, (d) 
World countries and their capitals, grouped by continent. 
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False Positive Rate 

Figure 4: Receiver Operating Characteristic for the multiple questions task. For each confusion matrix 
depicted in Fig.j^we compute ROC curves where true statements correspond to the diagonal and false statements 
to off-diagonal elements. The red dashed line represents the performance of a random classifier. 
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Figure 5: Real-world fact-checking scenario, (a) A document from the ground truth corpus, (b) Statement to 
fact-check: Did Steve Tesich graduate from Indiana University, Bloomington? This information is not present 
in the infobox, and thus it is not part of the WKG. (c) Annotations from five human raters. In this case, the 
majority of raters believe that the statement is true, and thus we consider it as such for classification purposes, 
(d) Receiver operating characteristic (ROC) curve of the classification for subject-predicate-object statements in 
which the predicate is “institution” (e.g., “Albert Einstein,” “institution,” “Institute for Advanced Studies”). A 
true positive rate above the false positive rate (dashed line), and correspondingly an area under the curve (AUC) 
above 0.5, indicate better than random performance, (e) ROC curve for statements with “degree” predicate (e.g., 
“Albert Einstein,” “degree,” “University Diploma”). 
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